Creating a safe function
safely() is an adverb; it takes a verb and modifies it. That is, it takes a function as an argument and it returns a function as its output. The function that is returned is modified so it never throws an error (and never stops the rest of your computation!).
Instead, it always returns a list with two elements:
result is the original result. If there was an error, this will be NULL.
error is an error object. If the operation was successful this will be NULL.
Let’s try to make the readLines() function safe.
library(purrr)
# Create safe_readLines() by passing readLines() to safely()
safe_readLines <- safely(readLines)
# Call safe_readLines() on "http://example.org"
example_lines <- safe_readLines("http://example.org")## Warning in file(con, "r"): cannot open URL 'http://example.org': HTTP
## status was '403 Forbidden'
example_lines## $result
## NULL
##
## $error
## <simpleError in file(con, "r"): kann Verbindung nicht öffnen>
# Call safe_readLines() on "http://asdfasdasdkfjlda"
nonsense_lines <- safe_readLines("http://asdfasdasdkfjlda")## Warning in file(con, "r"): InternetOpenUrl fehlgeschlagen: 'The server name
## or address could not be resolved'
nonsense_lines## $result
## NULL
##
## $error
## <simpleError in file(con, "r"): kann Verbindung nicht öffnen>
Using map safely
One feature of safely() is that it plays nicely with the map() functions. Consider this list containing the two URLs from the last exercise, plus one additional URL to make things more interesting:
urls <- list(
example = "http://example.org",
rproj = "http://www.r-project.org",
asdf = "http://asdfasdasdkfjlda"
)We are interested in quickly downloading the HTML files at each URL. You might try:
map(urls, readLines)But it results in an error, Error in file(con, "r") : cannot open the connection, and no output for any of the URLs. Go on, try it!
We can solve this problem by using our safe_readLines() instead.
# Define safe_readLines()
safe_readLines <- safely(readLines)
# Use the safe_readLines() function with map(): html
html <- map(urls, safe_readLines)## Warning in file(con, "r"): cannot open URL 'http://example.org': HTTP
## status was '403 Forbidden'
## Warning in file(con, "r"): InternetOpenUrl fehlgeschlagen: 'The server name
## or address could not be resolved'
# Call str() on html
str(html)## List of 3
## $ example:List of 2
## ..$ result: NULL
## ..$ error :List of 2
## .. ..$ message: chr "kann Verbindung nicht öffnen"
## .. ..$ call : language file(con, "r")
## .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
## $ rproj :List of 2
## ..$ result: chr [1:124] "<!DOCTYPE html>" "<html lang=\"en\">" " <head>" " <meta charset=\"utf-8\">" ...
## ..$ error : NULL
## $ asdf :List of 2
## ..$ result: NULL
## ..$ error :List of 2
## .. ..$ message: chr "kann Verbindung nicht öffnen"
## .. ..$ call : language file(con, "r")
## .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# Extract the result from one of the successful elements
html[["example"]][["result"]]## NULL
# Extract the error from the element that was unsuccessful
html[["asdf"]][["error"]]## <simpleError in file(con, "r"): kann Verbindung nicht öffnen>
Working with safe output
We now have output that contains the HTML for each of the two URLs on which readLines() was successful and the error for the other. But the output isn’t that easy to work with, since the results and errors are buried in the inner-most level of the list.
purrr provides a function transpose() that reshapes a list so the inner-most level becomes the outer-most level. In otherwords, it turns a list-of-lists “inside-out”. Consider the following list:
nested_list <- list(
x1 = list(a = 1, b = 2),
x2 = list(a = 3, b = 4)
)If I need to extract the a element in x1, I could do nested_list[["x1"]][["a"]]. However, if I transpose the list first, the order of subsetting reverses. That is, to extract the same element I could also do transpose(nested_list)[["a"]][["x1"]].
This is really handy for safe output, since we can grab all the results or all the errors really easily.
# Define safe_readLines() and html
safe_readLines <- safely(readLines)
html <- map(urls, safe_readLines)## Warning in file(con, "r"): cannot open URL 'http://example.org': HTTP
## status was '403 Forbidden'
## Warning in file(con, "r"): InternetOpenUrl fehlgeschlagen: 'The server name
## or address could not be resolved'
# Examine the structure of transpose(html)
str(transpose(html))## List of 2
## $ result:List of 3
## ..$ example: NULL
## ..$ rproj : chr [1:124] "<!DOCTYPE html>" "<html lang=\"en\">" " <head>" " <meta charset=\"utf-8\">" ...
## ..$ asdf : NULL
## $ error :List of 3
## ..$ example:List of 2
## .. ..$ message: chr "kann Verbindung nicht öffnen"
## .. ..$ call : language file(con, "r")
## .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
## ..$ rproj : NULL
## ..$ asdf :List of 2
## .. ..$ message: chr "kann Verbindung nicht öffnen"
## .. ..$ call : language file(con, "r")
## .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# Extract the results: res
res <- transpose(html)[["result"]]
# Extract the errors: errs
errs <- transpose(html)[["error"]]Working with errors and results
What you do with the errors and results is up to you. But, commonly you’ll want to collect all the results for the elements that were successful and examine the inputs for all those that weren’t.
# Initialize some objects
safe_readLines <- safely(readLines)
html <- map(urls, safe_readLines)## Warning in file(con, "r"): cannot open URL 'http://example.org': HTTP
## status was '403 Forbidden'
## Warning in file(con, "r"): InternetOpenUrl fehlgeschlagen: 'The server name
## or address could not be resolved'
res <- transpose(html)[["result"]]
errs <- transpose(html)[["error"]]
# Create a logical vector is_ok
is_ok <- map_lgl(errs, is.null)
# Extract the successful results
res[is_ok]## $rproj
## [1] "<!DOCTYPE html>"
## [2] "<html lang=\"en\">"
## [3] " <head>"
## [4] " <meta charset=\"utf-8\">"
## [5] " <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">"
## [6] " <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">"
## [7] " <title>R: The R Project for Statistical Computing</title>"
## [8] ""
## [9] " <link rel=\"icon\" type=\"image/png\" href=\"/favicon-32x32.png\" sizes=\"32x32\" />"
## [10] " <link rel=\"icon\" type=\"image/png\" href=\"/favicon-16x16.png\" sizes=\"16x16\" />"
## [11] ""
## [12] " <!-- Bootstrap -->"
## [13] " <link href=\"/css/bootstrap.min.css\" rel=\"stylesheet\">"
## [14] " <link href=\"/css/R.css\" rel=\"stylesheet\">"
## [15] ""
## [16] " <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->"
## [17] " <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->"
## [18] " <!--[if lt IE 9]>"
## [19] " <script src=\"https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js\"></script>"
## [20] " <script src=\"https://oss.maxcdn.com/respond/1.4.2/respond.min.js\"></script>"
## [21] " <![endif]-->"
## [22] " </head>"
## [23] " <body>"
## [24] " <div class=\"container page\">"
## [25] " <div class=\"row\">"
## [26] " <div class=\"col-xs-12 col-sm-offset-1 col-sm-2 sidebar\" role=\"navigation\">"
## [27] "<div class=\"row\">"
## [28] "<div class=\"col-xs-6 col-sm-12\">"
## [29] "<p><a href=\"/\"><img src=\"/Rlogo.png\" width=\"100\" height=\"78\" alt = \"R\" /></a></p>"
## [30] "<p><small><a href=\"/\">[Home]</a></small></p>"
## [31] "<h2 id=\"download\">Download</h2>"
## [32] "<p><a href=\"http://cran.r-project.org/mirrors.html\">CRAN</a></p>"
## [33] "<h2 id=\"r-project\">R Project</h2>"
## [34] "<ul>"
## [35] "<li><a href=\"/about.html\">About R</a></li>"
## [36] "<li><a href=\"/logo/\">Logo</a></li>"
## [37] "<li><a href=\"/contributors.html\">Contributors</a></li>"
## [38] "<li><a href=\"/news.html\">Whatâ\200\231s New?</a></li>"
## [39] "<li><a href=\"/bugs.html\">Reporting Bugs</a></li>"
## [40] "<li><a href=\"/conferences/\">Conferences</a></li>"
## [41] "<li><a href=\"/search.html\">Search</a></li>"
## [42] "<li><a href=\"/mail.html\">Get Involved: Mailing Lists</a></li>"
## [43] "<li><a href=\"http://developer.R-project.org\">Developer Pages</a></li>"
## [44] "<li><a href=\"https://developer.r-project.org/Blog/public/\">R Blog</a></li>"
## [45] "</ul>"
## [46] "</div>"
## [47] "<div class=\"col-xs-6 col-sm-12\">"
## [48] "<h2 id=\"r-foundation\">R Foundation</h2>"
## [49] "<ul>"
## [50] "<li><a href=\"/foundation/\">Foundation</a></li>"
## [51] "<li><a href=\"/foundation/board.html\">Board</a></li>"
## [52] "<li><a href=\"/foundation/members.html\">Members</a></li>"
## [53] "<li><a href=\"/foundation/donors.html\">Donors</a></li>"
## [54] "<li><a href=\"/foundation/donations.html\">Donate</a></li>"
## [55] "</ul>"
## [56] "<h2 id=\"help-with-r\">Help With R</h2>"
## [57] "<ul>"
## [58] "<li><a href=\"/help.html\">Getting Help</a></li>"
## [59] "</ul>"
## [60] "<h2 id=\"documentation\">Documentation</h2>"
## [61] "<ul>"
## [62] "<li><a href=\"http://cran.r-project.org/manuals.html\">Manuals</a></li>"
## [63] "<li><a href=\"http://cran.r-project.org/faqs.html\">FAQs</a></li>"
## [64] "<li><a href=\"http://journal.r-project.org\">The R Journal</a></li>"
## [65] "<li><a href=\"/doc/bib/R-books.html\">Books</a></li>"
## [66] "<li><a href=\"/certification.html\">Certification</a></li>"
## [67] "<li><a href=\"/other-docs.html\">Other</a></li>"
## [68] "</ul>"
## [69] "<h2 id=\"links\">Links</h2>"
## [70] "<ul>"
## [71] "<li><a href=\"http://www.bioconductor.org\">Bioconductor</a></li>"
## [72] "<li><a href=\"/other-projects.html\">Related Projects</a></li>"
## [73] "<li><a href=\"/gsoc.html\">GSoC</a></li>"
## [74] "</ul>"
## [75] "</div>"
## [76] "</div>"
## [77] " </div>"
## [78] " <div class=\"col-xs-12 col-sm-7\">"
## [79] " <h1>The R Project for Statistical Computing</h1>"
## [80] "<h2 id=\"getting-started\">Getting Started</h2>"
## [81] "<p>R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To <strong><a href=\"http://cran.r-project.org/mirrors.html\">download R</a></strong>, please choose your preferred <a href=\"http://cran.r-project.org/mirrors.html\">CRAN mirror</a>.</p>"
## [82] "<p>If you have questions about R like how to download and install the software, or what the license terms are, please read our <a href=\"http://cran.R-project.org/faqs.html\">answers to frequently asked questions</a> before you send an email.</p>"
## [83] "<h2 id=\"news\">News</h2>"
## [84] "<ul>"
## [85] "<li><p><a href=\"https://cran.r-project.org/src/base-prerelease\"><strong>R version 3.6.0 (Planting of a Tree) prerelease versions</strong></a> will appear starting Tuesday 2019-03-26. Final release is scheduled for Friday 2019-04-26.</p></li>"
## [86] "<li><p>useR! 2020 will take place in St. Louis, Missouri, USA.</p></li>"
## [87] "<li><p><a href=\"https://cran.r-project.org/src/base/R-3\"><strong>R version 3.5.3 (Great Truth)</strong></a> has been released on 2019-03-11.</p></li>"
## [88] "<li><p>The R Foundation Conference Committee has released a <a href=\"https://www.r-project.org/useR-2020_call.html\">call for proposals</a> to host useR! 2020 in North America.</p></li>"
## [89] "<li><p>You can now support the R Foundation with a renewable subscription as a <a href=\"https://www.r-project.org/foundation/donations.html\">supporting member</a></p></li>"
## [90] "<li><p>The R Foundation has been awarded the Personality/Organization of the year 2018 award by the professional association of German market and social researchers.</p></li>"
## [91] "</ul>"
## [92] "<h2 id=\"news-via-twitter\">News via Twitter</h2>"
## [93] "<a class=\"twitter-timeline\""
## [94] " href=\"https://twitter.com/_R_Foundation?ref_src=twsrc%5Etfw\""
## [95] " data-width=\"400\""
## [96] " data-show-replies=\"false\""
## [97] " data-chrome=\"noheader,nofooter,noborders\""
## [98] " data-dnt=\"true\""
## [99] " data-tweet-limit=\"3\">News from the R Foundation</a>"
## [100] "<script async"
## [101] " src=\"https://platform.twitter.com/widgets.js\""
## [102] " charset=\"utf-8\"></script>"
## [103] "<!--- (Boilerplate for release run-in)"
## [104] "- [**R version 3.1.3 (Smooth Sidewalk) prerelease versions**](http://cran.r-project.org/src/base-prerelease) will appear starting February 28. Final release is scheduled for 2015-03-09."
## [105] "-->"
## [106] " </div>"
## [107] " </div>"
## [108] " <div class=\"raw footer\">"
## [109] " © The R Foundation. For queries about this web site, please contact"
## [110] "\t<script type='text/javascript'>"
## [111] "<!--"
## [112] "var s=\"=b!isfg>#nbjmup;xfcnbtufsAs.qspkfdu/psh#?uif!xfcnbtufs=0b?\";"
## [113] "m=\"\"; for (i=0; i<s.length; i++) {if(s.charCodeAt(i) == 28){m+= '&';} else if (s.charCodeAt(i) == 23) {m+= '!';} else {m+=String.fromCharCode(s.charCodeAt(i)-1);}}document.write(m);//-->"
## [114] "\t</script>;"
## [115] " for queries about R itself, please consult the "
## [116] " <a href=\"help.html\">Getting Help</a> section."
## [117] " </div>"
## [118] " </div>"
## [119] " <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->"
## [120] " <script src=\"https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js\"></script>"
## [121] " <!-- Include all compiled plugins (below), or include individual files as needed -->"
## [122] " <script src=\"/js/bootstrap.min.js\"></script>"
## [123] " </body>"
## [124] "</html>"
# Find the URLs that were unsuccessful
urls[!is_ok]## $example
## [1] "http://example.org"
##
## $asdf
## [1] "http://asdfasdasdkfjlda"
Getting started with PMAP function
We’ll use random number generation as an example throughout the remaining exercises in this chapter. To get started, let’s imagine simulating 5 random numbers from a Normal distribution. You can do this in R with the rnorm() function. For example, to generate 5 random numbers from a Normal distribution with mean zero, we can do:
rnorm(n = 5)
Now, imagine you want to do this three times, but each time with a different sample size. You already know how! Let’s use the map() function to get it done.
# Create a list n containing the values: 5, 10, and 20
n <- list(5, 10, 20)
# Call map() on n with rnorm() to simulate three samples
map(n, rnorm)## [[1]]
## [1] -0.23427682 0.78945676 0.83024988 0.07964683 -0.46313623
##
## [[2]]
## [1] 0.28328339 0.44831089 -0.33852985 -0.77580026 -1.87501679
## [6] -0.85550274 -0.08992203 -0.14353417 1.21706024 0.21080106
##
## [[3]]
## [1] 1.557140184 0.264640165 -0.648813598 0.002256711 1.307312974
## [6] 0.638459570 -2.276195599 0.266393116 -0.542093348 -1.550621351
## [11] -0.622051243 -0.020746892 0.353698104 0.401880872 1.098331111
## [16] -0.467605744 1.262039965 1.480183939 0.691043432 0.466394157
Mapping over two arguments
Ok, but now imagine we don’t just want to vary the sample size, we also want to vary the mean. The mean can be specified in rnorm() by the argument mean. Now there are two arguments to rnorm() we want to vary: n and mean.
The map2() function is designed exactly for this purpose; it allows iteration over two objects. The first two arguments to map2() are the objects to iterate over and the third argument .f is the function to apply.
Let’s use map2() to simulate three samples with different sample sizes and different means.
# Initialize n
n <- list(5, 10, 20)
# Create a list mu containing the values: 1, 5, and 10
mu <- list(1, 5, 10)
# Edit to call map2() on n and mu with rnorm() to simulate three samples
map2(n, mu, rnorm)## [[1]]
## [1] 1.596001 1.694025 1.044146 3.241228 1.604315
##
## [[2]]
## [1] 4.191995 5.230428 6.542720 3.052835 5.111589 5.550276 3.904007
## [8] 6.325946 5.300712 3.966187
##
## [[3]]
## [1] 9.256437 8.895476 9.960877 9.950819 10.163150 10.785628 10.206201
## [8] 9.970207 9.526366 13.396723 9.661165 10.080031 9.254227 9.700249
## [15] 8.691257 10.618311 9.797848 10.978293 8.544279 10.521461
Mapping over more than two arguments
But wait, there’s another argument to rnorm() we might want to vary: sd, the standard deviation of the Normal distribution. You might think there is a map3() function, but there isn’t. Instead purrr provides a pmap() function that iterates over 2 or more arguments.
First, let’s take a look at pmap() for the situation we just solved: iterating over two arguments. Instead of providing each item to iterate over as arguments, pmap() takes a list of arguments as its input. For example, we could replicate our previous example, iterating over both n and mu with the following:
n <- list(5, 10, 20)
mu <- list(1, 5, 10)
pmap(list(n, mu), rnorm)## [[1]]
## [1] 1.616776 1.290233 2.074793 1.068007 0.884161
##
## [[2]]
## [1] 4.488230 4.077774 5.674381 5.818184 4.524794 4.747444 5.018365
## [8] 5.294642 6.846630 4.551111
##
## [[3]]
## [1] 8.589465 9.778773 11.126762 10.734834 12.701065 8.227599 11.776752
## [8] 10.677510 10.187909 10.159053 9.343368 10.463080 9.923501 10.504603
## [15] 10.673079 9.385409 9.201400 11.845225 11.726828 10.047659
Notice how we had to put our two items to iterate over (n and mu) into a list.
Let’s expand this code to iterate over varying standard deviations too.
# Initialize n and mu
n <- list(5, 10, 20)
mu <- list(1, 5, 10)
# Create a sd list with the values: 0.1, 1 and 0.1
sd <- list(0.1, 1, 0.1)
# Edit this call to pmap() to iterate over the sd list as well
pmap(list(n, mu, sd), rnorm)## [[1]]
## [1] 1.0353849 0.8693024 0.8271904 0.7420778 0.8118752
##
## [[2]]
## [1] 4.138805 5.776896 4.781821 4.395533 4.905831 5.732386 4.945166
## [8] 4.931679 4.331622 3.923911
##
## [[3]]
## [1] 9.974093 9.892556 10.043132 10.059135 9.990430 10.103529 9.857245
## [8] 9.985773 9.915744 10.146341 9.908526 9.844211 9.972959 9.992261
## [15] 9.990778 9.982418 10.013901 10.106738 10.142751 10.044827
Argument matching
Compare the following two calls to pmap():
pmap(list(n, mu, sd), rnorm)## [[1]]
## [1] 0.8911589 0.9571111 1.0143516 0.7453221 1.0203416
##
## [[2]]
## [1] 5.277511 5.381478 3.111510 3.358776 5.707939 3.070080 3.965042
## [8] 4.755473 4.552384 3.970299
##
## [[3]]
## [1] 10.030755 9.894993 9.955491 9.973041 9.754193 10.079449 9.937878
## [8] 9.847082 10.089314 10.021485 10.067825 10.125820 10.091152 10.145301
## [15] 10.108246 10.079696 9.805575 10.103789 9.817559 9.975337
pmap(list(mu, n, sd), rnorm)## [[1]]
## [1] 4.952815
##
## [[2]]
## [1] 9.153225 9.058165 8.836794 9.313634 10.044796
##
## [[3]]
## [1] 19.91207 19.79977 19.95534 19.85523 19.91151 20.16028 19.85607
## [8] 19.95187 19.89385 20.05069
What’s the difference? By default pmap() matches the elements of the list to the arguments in the function by position. In the first case, n to the n argument of rnorm(), mu to the mean argument of rnorm(), and sd to the sd argument of rnorm(). In the second case mu gets matched to the n argument of rnorm(), which is clearly not what we intended!
Instead of relying on this positional matching, a safer alternative is to provide names in our list. The name of each element should be the argument name we want to match it to.
Let’s fix up that second call.
# Name the elements of the argument list
pmap(list(mean=mu, n=n, sd=sd), rnorm)## [[1]]
## [1] 0.8604371 0.8800275 0.8784867 1.0167480 1.0804304
##
## [[2]]
## [1] 5.207695 3.859108 4.721418 5.647814 5.547493 4.363869 4.798266
## [8] 5.036228 5.923423 4.547289
##
## [[3]]
## [1] 9.927156 9.873410 9.964369 9.929337 9.920661 9.944723 9.871033
## [8] 9.970318 9.813902 9.861765 10.139435 9.897397 9.936863 9.869877
## [15] 10.135849 9.986748 10.042639 9.882784 10.265884 10.079541
Mapping over functions and their arguments
Sometimes it’s not the arguments to a function you want to iterate over, but a set of functions themselves. Imagine that instead of varying the parameters to rnorm() we want to simulate from different distributions, say, using rnorm(), runif(), and rexp(). How do we iterate over calling these functions?
In purrr, this is handled by the invoke_map() function. The first argument is a list of functions. In our example, something like:
funs <- list("rnorm", "runif", "rexp")The second argument specifies the arguments to the functions. In the simplest case, all the functions take the same argument, and we can specify it directly, relying on … to pass it to each function. In this case, call each function with the argument n = 5:
invoke_map(funs, n = 5)## [[1]]
## [1] -2.2079088 1.6008993 1.2993616 -1.6601251 -0.7898239
##
## [[2]]
## [1] 0.03551629 0.49458359 0.69651414 0.83496444 0.05718173
##
## [[3]]
## [1] 0.5929766 1.6278008 2.2331468 0.9360883 0.2407428
In more complicated cases, the functions may take different arguments, or we may want to pass different values to each function. In this case, we need to supply invoke_map() with a list, where each element specifies the arguments to the corresponding function.
Let’s use this approach to simulate three samples from the following three distributions: Normal(10, 1), Uniform(0, 5), and Exponential(5).
# Define list of functions
funs <- list("rnorm", "runif", "rexp")
# Parameter list for rnorm()
rnorm_params <- list(mean = 10)
# Add a min element with value 0 and max element with value 5
runif_params <- list(min = 0, max = 5)
# Add a rate element with value 5
rexp_params <- list(rate = 5)
# Define params for each function
params <- list(
rnorm_params,
runif_params,
rexp_params
)
# Call invoke_map() on funs supplying params and setting n to 5
invoke_map(funs, params, n = 5)## [[1]]
## [1] 9.967989 8.096374 9.801156 9.861206 8.080147
##
## [[2]]
## [1] 1.781383 4.567286 4.433149 3.338713 1.990042
##
## [[3]]
## [1] 0.07862809 0.31788282 0.09767405 0.02190067 0.16558911
Walk
walk() operates just like map() except it’s designed for functions that don’t return anything. You use walk() for functions with side effects like printing, plotting or saving.
Let’s check that our simulated samples are in fact what we think they are by plotting a histogram for each one.
# Define list of functions
funs <- list(Normal = "rnorm", Uniform = "runif", Exp = "rexp")
# Define params
params <- list(
Normal = list(mean = 10),
Uniform = list(min = 0, max = 5),
Exp = list(rate = 5)
)
# Assign the simulated samples to sims
sims <- invoke_map(funs, params, n = 50)
# Use walk() to make a histogram of each element in sims
sims %>% walk(hist)Take a quick look through the three histograms, do they have any problems?
Walking over two or more arguments
Those histograms were pretty good, but they really needed better breaks for the bins on the x-axis. That means we need to vary two arguments to hist(): x and breaks. Remember map2()? That allowed us to iterate over two arguments. Guess what? There is a walk2(), too!
Let’s use walk2() to improve those histograms with better breaks.
# Replace with reasonable breaks for each sample
breaks_list <- list(
Normal = seq(6, 16, 0.5),
Uniform = seq(0, 5, 0.25),
Exp = seq(0, 1.5, 0.1)
)
# Use walk2() to make histograms with the right breaks
walk2(sims, breaks_list, hist)Don’t worry about those ugly labels. We’ll fix them later.
Putting together writing functions and walk
In the previous exercise, we hard-coded the breaks, but that was a little lazy. Those breaks probably won’t be great if we change the parameters of our simulation.
A better idea would be to generate reasonable breaks based on the actual values in our simulated samples. This is a great chance to review our function writing skills and combine our own function with purrr.
Let’s start by writing our own function find_breaks(), which copies the default breaks in the ggplot2 package: break the range of the data in 30 bins.
How do we start? Simple, of course! Here’s a snippet of code that works for the first sample:
rng <- range(sims[[1]], na.rm = TRUE)
seq(rng[1], rng[2], length.out = 30)## [1] 8.239979 8.374912 8.509845 8.644778 8.779711 8.914644 9.049577
## [8] 9.184510 9.319443 9.454376 9.589308 9.724241 9.859174 9.994107
## [15] 10.129040 10.263973 10.398906 10.533839 10.668772 10.803705 10.938638
## [22] 11.073571 11.208503 11.343436 11.478369 11.613302 11.748235 11.883168
## [29] 12.018101 12.153034
Your job in this exercise is to turn that snippet into a function.
In the next exercise, we’ll combine find_breaks() with map() and walk2() to create histograms with sensible breaks.
# Turn this snippet into find_breaks()
find_breaks <- function(x) {
rng <- range(x, na.rm = TRUE)
seq(rng[1], rng[2], length.out = 30)
}
# Call find_breaks() on sims[[1]]
find_breaks(sims[[1]]) ## [1] 8.239979 8.374912 8.509845 8.644778 8.779711 8.914644 9.049577
## [8] 9.184510 9.319443 9.454376 9.589308 9.724241 9.859174 9.994107
## [15] 10.129040 10.263973 10.398906 10.533839 10.668772 10.803705 10.938638
## [22] 11.073571 11.208503 11.343436 11.478369 11.613302 11.748235 11.883168
## [29] 12.018101 12.153034
Nice breaks for all
Now that we have find_breaks(), we can find nice breaks for all the samples using map(). Then, pass the result into walk2() to get nice (but custom breaks) for our samples.
# Use map() to iterate find_breaks() over sims: nice_breaks
nice_breaks <- map(sims, find_breaks)
# Use nice_breaks as the second argument to walk2()
walk2(sims, nice_breaks, hist)Now let’s fix those ugly labels!
Walking with many arguments: pwalk
Ugh! Nice breaks but those plots had UUUUGLY labels and titles. The x-axis labels are easy to fix if we don’t mind every plot having its x-axis labeled the same way. We can use the ... argument to any of the map() or walk() functions to pass in further arguments to the function .f. In this case, we might decide we don’t want any labels on the x-axis, in which case we need to pass an empty string to the xlab argument of hist():
walk2(sims, nice_breaks, hist, xlab = "")But, what about the titles? We don’t want them to be the same for each plot. How can we iterate over the arguments x, breaks and main? You guessed it, there is a pwalk() function that works just like pmap().
Let’s use pwalk() to tidy up these plots. Also, let’s increase our sample size to 1000.
# Increase sample size to 1000
sims <- invoke_map(funs, params, n = 1000)
# Compute nice_breaks (don't change this)
nice_breaks <- map(sims, find_breaks)
# Create a vector nice_titles
nice_titles <- list("Normal(10, 1)", "Uniform(0, 5)", "Exp(5)")
# Use pwalk() instead of walk2()
pwalk(list(x = sims, breaks = nice_breaks, main = nice_titles), hist, xlab = "")Walking with pipes
One of the nice things about the walk() functions is that they return the object you passed to them. This means they can easily be used in pipelines (a pipeline is just a short way of saying “a statement with lots of pipes”).
To illustrate, we’ll return to our first example of making histograms for each sample:
walk(sims, hist)Take a look at what gets returned:
tmp <- walk(sims, hist)str(tmp)## List of 3
## $ Normal : num [1:1000] 8.62 10.45 10.87 8.54 9.26 ...
## $ Uniform: num [1:1000] 0.59 3.91 1.2 4.85 3.98 ...
## $ Exp : num [1:1000] 0.138 0.2606 0.0682 0.1957 0.2158 ...
It’s our original sims object. That means we can pipe the sims object along to other functions. For example, we might want some basic summary statistics on each sample as well as our histograms.
# Pipe this along to map(), using summary() as .f
sims %>%
walk(hist) %>% map(summary)## $Normal
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.975 9.378 10.016 10.010 10.679 12.968
##
## $Uniform
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.002439 1.228325 2.490056 2.479221 3.719553 4.992232
##
## $Exp
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000655 0.0602891 0.1440154 0.1980766 0.2660853 1.7336472
Session Info
sessionInfo()## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
## [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
## [5] LC_TIME=German_Switzerland.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] purrr_0.3.0 ggplot2_3.1.0 dplyr_0.8.0.1 gapminder_0.3.0
## [5] kableExtra_1.0.1 knitr_1.21
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.0 plyr_1.8.4 pillar_1.3.1
## [4] compiler_3.5.2 prettydoc_0.2.1 tools_3.5.2
## [7] digest_0.6.18 gtable_0.2.0 evaluate_0.12
## [10] tibble_2.0.1 viridisLite_0.3.0 pkgconfig_2.0.2
## [13] rlang_0.3.1 rstudioapi_0.9.0 yaml_2.2.0
## [16] xfun_0.4 withr_2.1.2 httr_1.4.0
## [19] stringr_1.4.0 xml2_1.2.0 hms_0.4.2
## [22] webshot_0.5.1 grid_3.5.2 tidyselect_0.2.5
## [25] glue_1.3.0 R6_2.4.0 rmarkdown_1.11
## [28] readr_1.3.1 magrittr_1.5 scales_1.0.0
## [31] htmltools_0.3.6 assertthat_0.2.0 rvest_0.3.2
## [34] colorspace_1.4-0 stringi_1.3.1 lazyeval_0.2.1
## [37] munsell_0.5.0 crayon_1.3.4